|
A bitmap index is a special kind of database index that uses bitmaps. Bitmap indexes have traditionally been considered to work well for ''low-cardinality columns'', which have a modest number of distinct values, either absolutely, or relative to the number of records that contain the data. The extreme case of low cardinality is Boolean data (e.g., does a resident in a city have internet access?), which has two values, True and False. Bitmap indexes use bit arrays (commonly called bitmaps) and answer queries by performing bitwise logical operations on these bitmaps. Bitmap indexes have a significant space and performance advantage over other structures for query of such data. Their drawback is they are less efficient than the traditional B-tree indexes for columns whose data is frequently updated: consequently, they are more often employed in read-only systems that are specialized for fast query - e.g., data warehouses, and generally unsuitable for online transaction processing applications. Some researchers argue that bitmap indexes are also useful for moderate or even high-cardinality data (e.g., unique-valued data) which is accessed in a read-only manner, and queries access multiple bitmap-indexed columns using the AND, OR or XOR operators extensively.〔(Bitmap Index vs. B-tree Index: Which and When? ), Vivek Sharma, Oracle Technical Network.〕 Bitmap indexes are also useful in data warehousing applications for joining a large fact table to smaller dimension tables such as those arranged in a star schema. Bitmap based representation can also be used for representing a data structure which is labeled and directed attributed multigraph, used for queries in graph databases. (Efficient graph management based on bitmap indices ) article shows how bitmap index representation can be used to manage large dataset(billions of data points) and answer queries related to graph efficiently.==Example== Continuing the internet access example, a bitmap index may be logically viewed as follows: On the left, Identifier refers to the unique number assigned to each resident, HasInternet is the data to be indexed, the content of the bitmap index is shown as two columns under the heading ''bitmaps''. Each column in the left illustration is a ''bitmap'' in the bitmap index. In this case, there are two such bitmaps, one for "has internet" ''Yes'' and one for "has internet" ''No''. It is easy to see that each bit in bitmap ''Y'' shows whether a particular row refers to a person who has internet access. This is the simplest form of bitmap index. Most columns will have more distinct values. For example, the sales amount is likely to have a much larger number of distinct values. Variations on the bitmap index can effectively index this data as well. We briefly review three such variations. Note: Many of the references cited here are reviewed at (John Wu (2007)). For those who might be interested in experimenting with some of the ideas mentioned here, many of them are implemented in open source software such as FastBit,〔(FastBit )〕 the Lemur Bitmap Index C++ Library,〔(Lemur Bitmap Index C++ Library )〕 the Roaring Bitmap Java library,〔(Roaring bitmaps )〕 the Apache Hive Data Warehouse system and LucidDB. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「bitmap index」の詳細全文を読む スポンサード リンク
|